Overview

Dataset Statistics

Number of Variables 26
Number of Rows 7010
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 4.1 MB
Average Row Size in Memory 610.9 B
Variable Types
  • Categorical: 16
  • Numerical: 9
  • GeoGraphy: 1

Dataset Insights

Patient ID has a high cardinality: 7010 distinct values High Cardinality
Blood Pressure has a high cardinality: 3590 distinct values High Cardinality
Patient ID has constant length 7 Constant Length
Diabetes has constant length 1 Constant Length
Family History has constant length 1 Constant Length
Smoking has constant length 1 Constant Length
Obesity has constant length 1 Constant Length
Alcohol Consumption has constant length 1 Constant Length
Previous Heart Problems has constant length 1 Constant Length
Medication Use has constant length 1 Constant Length
Physical Activity Days Per Week has constant length 1 Constant Length
Hemisphere has constant length 19 Constant Length
Heart Attack Risk has constant length 1 Constant Length
Patient ID has all distinct values Unique
  • 1
  • 2

Variables


Patient ID

categorical

Approximate Distinct Count 7010
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 504720

Length

Mean 7
Standard Deviation 0
Median 7
Minimum 7
Maximum 7

Sample

1st row RDG0550
2nd row NMA3851
3rd row TUI5807
4th row YYT5016
5th row ZAC5937

Letter

Count 21030
Lowercase Letter 0
Space Separator 0
Uppercase Letter 21030
Dash Punctuation 0
Decimal Number 28040
  • Patient ID contains many words: 7010 words
  • Patient ID has words of constant length

Age

numerical

Approximate Distinct Count 73
Approximate Unique (%) 1.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 53.5104
Minimum 18
Maximum 90
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Age is skewed right (γ1 = 0.0339)

Quantile Statistics

Minimum 18
5-th Percentile 21
Q1 35
Median 53
Q3 72
95-th Percentile 87
Maximum 90
Range 72
IQR 37

Descriptive Statistics

Mean 53.5104
Standard Deviation 21.291
Variance 453.3049
Sum 375108
Skewness 0.03386
Kurtosis -1.2079
Coefficient of Variation 0.3979
  • Age is not normally distributed (p-value 3.668796562095987e-30)

Sex

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 487928
  • The largest value (Male) is over 2.31 times larger than the second largest value (Female)

Length

Mean 4.6046
Standard Deviation 0.9186
Median 4
Minimum 4
Maximum 6

Sample

1st row Male
2nd row Female
3rd row Female
4th row Female
5th row Female

Letter

Count 32278
Lowercase Letter 25268
Space Separator 0
Uppercase Letter 7010
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Male, Female) take over 50.0%
  • The largest value (male) is over 2.31 times larger than the second largest value (female)

Cholesterol

numerical

Approximate Distinct Count 281
Approximate Unique (%) 4.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 259.8807
Minimum 120
Maximum 400
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Cholesterol is skewed left (γ1 = -0.0052)

Quantile Statistics

Minimum 120
5-th Percentile 132
Q1 192
Median 259
Q3 329
95-th Percentile 385
Maximum 400
Range 280
IQR 137

Descriptive Statistics

Mean 259.8807
Standard Deviation 80.7092
Variance 6513.9826
Sum 1.8218e+06
Skewness -0.005214
Kurtosis -1.1765
Coefficient of Variation 0.3106

Blood Pressure

categorical

Approximate Distinct Count 3590
Approximate Unique (%) 51.2%
Missing 0
Missing (%) 0.0%
Memory Size 498499

Length

Mean 6.1126
Standard Deviation 0.5193
Median 6
Minimum 5
Maximum 7

Sample

1st row 129/90
2nd row 159/105
3rd row 161/109
4th row 120/62
5th row 153/110

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 35839
  • Blood Pressure contains many words: 3590 words

Heart Rate

numerical

Approximate Distinct Count 71
Approximate Unique (%) 1.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 75.106
Minimum 40
Maximum 110
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Heart Rate is skewed left (γ1 = -0.0052)

Quantile Statistics

Minimum 40
5-th Percentile 43
Q1 57
Median 75
Q3 93
95-th Percentile 107
Maximum 110
Range 70
IQR 36

Descriptive Statistics

Mean 75.106
Standard Deviation 20.5072
Variance 420.5436
Sum 526493
Skewness -0.0052
Kurtosis -1.2073
Coefficient of Variation 0.273
  • Heart Rate is not normally distributed (p-value 7.848039447719522e-26)

Diabetes

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 462660
  • The largest value (1) is over 1.88 times larger than the second largest value (0)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 1
3rd row 0
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • The top 2 categories (1, 0) take over 50.0%
  • The largest value (1) is over 1.88 times larger than the second largest value (0)
  • Diabetes has words of constant length

Family History

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 462660

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 1
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • The top 2 categories (0, 1) take over 50.0%
  • Family History has words of constant length

Smoking

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 462660
  • The largest value (1) is over 8.64 times larger than the second largest value (0)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 0
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • The top 2 categories (1, 0) take over 50.0%
  • The largest value (1) is over 8.64 times larger than the second largest value (0)
  • Smoking has words of constant length

Obesity

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 462660

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 0
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • The top 2 categories (0, 1) take over 50.0%
  • Obesity has words of constant length

Alcohol Consumption

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 462660

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 1
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • The top 2 categories (1, 0) take over 50.0%
  • Alcohol Consumption has words of constant length

Exercise Hours Per Week

numerical

Approximate Distinct Count 7010
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 9.9791
Minimum 0.002442
Maximum 19.9987
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Exercise Hours Per Week is skewed left (γ1 = -0.0039)

Quantile Statistics

Minimum 0.002442
5-th Percentile 0.9221
Q1 5.046
Median 9.983
Q3 15.0297
95-th Percentile 18.9044
Maximum 19.9987
Range 19.9963
IQR 9.9836

Descriptive Statistics

Mean 9.9791
Standard Deviation 5.7697
Variance 33.2897
Sum 69953.5575
Skewness -0.003878
Kurtosis -1.1966
Coefficient of Variation 0.5782

Diet

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 509356

Length

Mean 7.6613
Standard Deviation 0.941
Median 7
Minimum 7
Maximum 9

Sample

1st row Unhealthy
2nd row Average
3rd row Average
4th row Healthy
5th row Healthy

Letter

Count 53706
Lowercase Letter 46696
Space Separator 0
Uppercase Letter 7010
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Healthy, Average) take over 50.0%

Previous Heart Problems

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 462660

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 0
2nd row 1
3rd row 1
4th row 0
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • The top 2 categories (0, 1) take over 50.0%
  • Previous Heart Problems has words of constant length

Medication Use

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 462660

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 0
3rd row 1
4th row 1
5th row 0

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • The top 2 categories (1, 0) take over 50.0%
  • Medication Use has words of constant length

Stress Level

numerical

Approximate Distinct Count 10
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 5.4518
Minimum 1
Maximum 10
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Stress Level is skewed right (γ1 = 0.0128)

Quantile Statistics

Minimum 1
5-th Percentile 1
Q1 3
Median 5
Q3 8
95-th Percentile 10
Maximum 10
Range 9
IQR 5

Descriptive Statistics

Mean 5.4518
Standard Deviation 2.858
Variance 8.1681
Sum 38217
Skewness 0.01283
Kurtosis -1.2163
Coefficient of Variation 0.5242
  • Stress Level is not normally distributed (p-value 0.00042156825053817185)

Sedentary Hours Per Day

numerical

Approximate Distinct Count 7010
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 5.994
Minimum 0.001263
Maximum 11.9993
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Sedentary Hours Per Day is skewed right (γ1 = 0.0182)

Quantile Statistics

Minimum 0.001263
5-th Percentile 0.6132
Q1 2.9718
Median 5.9369
Q3 9.0176
95-th Percentile 11.4302
Maximum 11.9993
Range 11.9981
IQR 6.0458

Descriptive Statistics

Mean 5.994
Standard Deviation 3.472
Variance 12.0549
Sum 42017.987
Skewness 0.01818
Kurtosis -1.1981
Coefficient of Variation 0.5792

Income

numerical

Approximate Distinct Count 6921
Approximate Unique (%) 98.7%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 158245.3489
Minimum 20062
Maximum 299954
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Income is skewed right (γ1 = 0.0224)

Quantile Statistics

Minimum 20062
5-th Percentile 32875.25
Q1 88368
Median 157378.5
Q3 227218.5
95-th Percentile 285837.65
Maximum 299954
Range 279892
IQR 138850.5

Descriptive Statistics

Mean 158245.3489
Standard Deviation 80585.3167
Variance 6.494e+09
Sum 1.1093e+09
Skewness 0.02237
Kurtosis -1.1816
Coefficient of Variation 0.5092

BMI

numerical

Approximate Distinct Count 7010
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 28.8787
Minimum 18.0023
Maximum 39.9936
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • BMI is skewed right (γ1 = 0.0389)

Quantile Statistics

Minimum 18.0023
5-th Percentile 19.0806
Q1 23.4223
Median 28.7376
Q3 34.3212
95-th Percentile 38.9105
Maximum 39.9936
Range 21.9912
IQR 10.8989

Descriptive Statistics

Mean 28.8787
Standard Deviation 6.3224
Variance 39.9727
Sum 202439.6308
Skewness 0.03888
Kurtosis -1.186
Coefficient of Variation 0.2189

Triglycerides

numerical

Approximate Distinct Count 771
Approximate Unique (%) 11.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 112160
Mean 416.782
Minimum 30
Maximum 800
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Triglycerides is skewed right (γ1 = 0.0014)

Quantile Statistics

Minimum 30
5-th Percentile 68
Q1 221
Median 416
Q3 613
95-th Percentile 765.55
Maximum 800
Range 770
IQR 392

Descriptive Statistics

Mean 416.782
Standard Deviation 224.1951
Variance 50263.4588
Sum 2.9216e+06
Skewness 0.001374
Kurtosis -1.2083
Coefficient of Variation 0.5379

Physical Activity Days Per Week

categorical

Approximate Distinct Count 8
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 462660

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 6
2nd row 7
3rd row 2
4th row 0
5th row 2

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • Physical Activity Days Per Week has words of constant length

Sleep Hours Per Day

categorical

Approximate Distinct Count 7
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 463700

Length

Mean 1.1484
Standard Deviation 0.3555
Median 1
Minimum 1
Maximum 2

Sample

1st row 7
2nd row 8
3rd row 10
4th row 9
5th row 5

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 8050

Country

categorical

Approximate Distinct Count 20
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 511433

Length

Mean 7.9576
Standard Deviation 2.7837
Median 7
Minimum 5
Maximum 14

Sample

1st row Argentina
2nd row Nigeria
3rd row Thailand
4th row Spain
5th row Germany

Letter

Count 54066
Lowercase Letter 45339
Space Separator 1717
Uppercase Letter 8727
Dash Punctuation 0
Decimal Number 0

Continent

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 508242

Length

Mean 7.5024
Standard Deviation 3.4995
Median 6
Minimum 4
Maximum 13

Sample

1st row South America
2nd row Africa
3rd row Asia
4th row Europe
5th row Europe

Letter

Count 50814
Lowercase Letter 42026
Space Separator 1778
Uppercase Letter 8788
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Asia, Europe) take over 50.0%

Hemisphere

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 588840
  • The largest value (Northern Hemisphere) is over 1.82 times larger than the second largest value (Southern Hemisphere)

Length

Mean 19
Standard Deviation 0
Median 19
Minimum 19
Maximum 19

Sample

1st row Southern Hemispher...
2nd row Northern Hemispher...
3rd row Northern Hemispher...
4th row Southern Hemispher...
5th row Northern Hemispher...

Letter

Count 126180
Lowercase Letter 112160
Space Separator 7010
Uppercase Letter 14020
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Northern Hemisphere, Southern Hemisphere) take over 50.0%
  • The largest value (hemisphere) is over 1.55 times larger than the second largest value (northern)
  • Hemisphere has words of constant length

Heart Attack Risk

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 462660
  • The largest value (0) is over 1.8 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0
Median 1
Minimum 1
Maximum 1

Sample

1st row 1
2nd row 1
3rd row 0
4th row 1
5th row 1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 7010
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 1.8 times larger than the second largest value (1)
  • Heart Attack Risk has words of constant length

Interactions

Correlations

Missing Values